ANATOMY OF A YOUNG GIANT 
COMPONENT IN THE RANDOM GRAPH 



JIAN DING, JEONG HAN KIM, EYAL LUBETZKY AND YUVAL PERES 

Abstract. We provide a complete description of the giant component 
of the Erdos-Renyi random graph Q{n,p) as soon as it emerges from the 
scaling window, i.e., for p = (1 + e)/n where e^n oo and e = o(l). 

Our description is particularly simple for e = o{n^^^'^), where the 
giant component Ci is contiguous with the following model (i.e., every 
graph property that holds with high probability for this model also holds 
w.h.p. for Ci). Let Z be normal with mean |e^n and variance e'^n, and 
let ^ be a random 3-regular graph on 2 [Z\ vertices. Replace each edge 
of /C by a path, where the path lengths are i.i.d. geometric with mean 
1/e. Finally, attach an independent Poisson(l — £)-Galton- Watson tree 
to each vertex. 

A similar picture is obtained for larger e = o(l), in which case the 
random 3-regular graph is replaced by a random graph with Nk vertices 
of degree k for > 3, where Nk has mean and variance of order e*n. 

This description enables us to determine fundamental characteristics 
of the supercritical random graph. Namely, we can infer the asymptotics 
of the diameter of the giant component for any rate of decay of e, as 
well as the mixing time of the random walk on Ci. 

1. Introduction 

The Erdos and Renyi random graph Q{n,p) has been studied extensively 
since its introduction in 1959 [15]. Much of the analysis of this fundamental 
random graph model has focused on its behavior near the critical point 
p = 1/n. Nevertheless, a few key features, such as the diameter and the 
mixing time of the random walk on the largest component, have remained 
unknown in a regime just beyond criticality. 

In their seminal papers from the 1960's, Erdos and Renyi established a 
phenomenon known as the double jump. For p = c/n where c < 1 is fixed, 
the largest component Ci has size O(logn) with high probability (w.h.p.). 
When c > 1, the size of Ci is linear in n, and at the critical c = 1 it has order 
n^/^ (this latter fact was fully established much later by Bollobas [10] and 
Luczak [24]). As discovered in [10], the critical behavior extends throughout 
the critical window, the regime where p = (lie)/?! for e = 0{n~^^^). 
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Up to the critical point, the structure of Ci is relatively well understood. 
For instance, in the fully subcritical regime {p = (1 — e)/n for e > fixed), 
Ci is a tree of known (logarithmic) size and diameter. In the critical window 
(e = 0(n~^/^) the distribution of \Ci\ was determined in [1,26], and the 
diameter was found in [28]. See [9,19] for further information. 

In the supercritical regime = (1 + £)/n with e^n — > oo), a variety of 
methods can determine key features of Ci up to some continuous functions 
of e. While these functions remain bounded in the fully supercritical case 
(e > fixed), the situation becomes much more delicate as e approaches the 
critical window. 

For example, one can deduce that the diameter of the fully supercritical 
Ci has order log n merely by analyzing certain (weak) expansion properties 
of its 2-core (formally defined in Section 2). More precise results on the 
diameter were obtained in [27,32], but they still do not give the asymptotic 
diameter in the whole supercritical regime. 

In the fully supercritical case, it is known that the giant component con- 
sists of an expander, "decorated" using paths and trees of at most logarith- 
mic size (see [6] for a concrete example of such a statement, used there to 
obtain the order of the mixing time on the fully supercritical Ci). However, 
the existing decompositions of the giant component are not precise enough 
to handle the case where e — > (e.g., in [32] Riordan and Wormald point 
out that this is the most difficult regime for determining the diameter). 

In this work, we obtain a complete characterization of the supercritical 
giant component. Rather than merely describing its properties, we present 
a simple construction whose distribution is contiguous with that of Ci . This 
construction is particularly elegant when the giant component is "young", 
namely when e = o(n~^/^). Since this is the hardest regime for alternative 
approaches, we start by describing this special case. 

Let J\f{fJ., a^) denote the normal distribution with mean /i and variance cr^, 
and let Geom(e) denote the geometric distribution with mean 1/e. 

Theorem 1. Let Ci be the largest component of the random graph Q{n,p) 
for p = ii^, where e^n — > oo and £ = o(n~^/^). Then Ci is contiguous to 
the model Ci, constructed in 3 steps as follows: 

1. Let Z ^ M (|e^n,e'^n), and select a random 3-regular multigraph K, on 
N = 2[Z\ vertices. 

2. Replace each edge ofKL by a path, where the path lengths are i.i.d. Geom(e). 

3. Attach an independent Poisson(l — e)-Galton- Watson tree to each vertex. 
That is, P(Ci G ^) ^ implies P(Ci G ^) ^ for any set of graphs A. 

In the above, a Poisson(/i)-Galton- Watson tree is the family tree of a 
Galton- Watson branching process with offspring distribution Poisson(/i). 
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Two well-known objects relevant to the study of the giant component 

(2) 

are its 2-core and its kernel /C. The 2-core of a graph is its maximum 
subgraph where all degrees are at least 2. The kernel is obtained from the 
2-core by replacing every maximal 2-path by an edge (where a 2-path is a 
path where all internal vertices have degree 2). Note that our description 
of Ci constructs the kernel in Step 1, the 2-core in Step 2 and the entire 
component Ci in Step 3. 

The above theorem not only states that the kernel of Ci in this regime 
is an expander, but it is in fact contiguous to a random 3-regular graph, 
an object whose expansion properties are well understood (cf., e.g., [18]). 
Furthermore, the 2-core is obtained from the kernel by a simple operation 
("stretching" the edges into paths of lengths i.i.d. geometric with mean 1/e). 
This allows us to pinpoint the expansion properties of the 2-core and their 
dependence on e as it tends to 0. 

A few known (yet nontrivial) properties of the 2-core of Ci can be imme- 
diately read off from Theorem 1. For instance, w.h.p. the 2-core contains 
(2+o(l))e^n vertices while the kernel has (|+o(l))e'^n vertices (see [25,31]). 
As there are w.h.p. (2 + o{l))£^n ed ges in the kernel, a simple estimate of 
the maximum of i.i.d. geometric variables gives the following corollary. 

Corollary 1. Let C| be the 2-core of the largest component of Q{n,p) for 
p = ii^, where e^n oo and e = o{n~^^'^). The maximal 2-path in C^^ has 
length (1/e) log(e3n) + Op(l/e). 

Similarly, since a random 3-regular graph is Hamiltonian w.h.p. (see [33]), 
we immediately deduce that in the above regime Ci contains a simple cycle 
of length (| + o{l))e^n. This matches the lower bound of Luczak [25] on the 
circumference of the supercritical random graph. 

Moreover, Theorem 1 enables us to interpret distances in the 2-core as 
passage times in first-passage percolation (for further information on this 
thoroughly studied topic, see, e.g., [21]). As we state in Theorem 3 be- 
low, this connection (used in a companion paper [12]) gives the asymptotic 
behavior of the diameter throughout the regime e^n — > co and e = o(l). 

1.1. Main results. We now state the extension of Theorem 1 to all e = o(l) 
outside the critical window. 

Theorem 2. Let C\ he the largest component of Q{n,p) for p = where 
e^n oo and e ^ 0. Let fi < 1 denote the conjugate of 1 + e, that is, 
/ie~^ = (1 -\- e)e~^^~^''K Then Ci is contiguous to the following model Ci: 
1. Let A ~ (l + e — /i, ^) and assign i.i.d. variables Poisson(A) 
(u £ [n]) to the vertices, conditioned that ^Z?ul£)^>3 is even. Let 

Nk = #{n ■.D^ = k} and N = ^^>3 A^^ . 
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Select a random multigraph fC on N vertices, uniformly among all 
multigraphs with N/^ vertices of degree k for k > 3. 

2. Replace the edges of K, by paths of lengths i.i.d. Geom(l — 

3. Attach an independent Poissoii(^) - Ga/ton- VKaison tree to each vertex. 
That is, P(Ci G ^) — > implies P(Ci G ^) ^ for any set of graphs A. 

We note that conditioning that the sum of degrees is even can easily be 
reahzed by rejection samphng. The differences between the two theorems are 
the approximation of 1 — /x f« e in Steps 2,3, and a richer degree distribution 
of the random graph /C in Step 1. 

Further note that it was shown by Luczak [25] that the kernel fC in the 
above regime is a random multigraph on a certain degree sequence, which is 
cubic except for a negligible number of vertices. However, in that description 
the vertex degrees and lengths of the 2-paths subdividing the kernel edges 
are all dependent, whereas in our contiguous model these are i.i.d. Poisson 
(Step 1) and i.i.d. Geometric (Step 2) respectively. 

Combining Theorem 2 with some known results on first-passage percola- 
tion from [7] gives an immediate corollary on the typical distances between 
vertices of degree at least 3 in the 2-core. 

Corollary 2. Let C\ he the 2-core of the largest component of Q{n,p) for 
p = ii^, where e^n — > oo and e = o(l). Let u,v be two vertices of degree at 
least 3 in chosen u.a.r. among all such vertices. The distance between 
u,v is w.h.p. (1/e -I- 0(1)) log(e'^n). 

However, maximal distances in the 2-core can differ from typical distances; 
compare the above result to (1.3) in the next theorem, which we prove in a 
companion paper. 

Theorem 3 ([12]). Consider the random graph Q{n,p) for p = where 

e^n — > oo and e = o(l). Let C\ be the largest component G, let be its 
2-core and let IC denote its kernel. Then w.h.p., 

diam(Ci) = (3 + o(l)) (1/e) log{e\) , (1.1) 

diam(cf ^) = (2 + o(l)) (1/e) log{e^n) , (1.2) 

max dist_(2) (u, v) = + o(l)) (1/e) log(e^n) . (1.3) 

To prove the above theorem, we need to go beyond typical distances and 
obtain new large deviation estimates for the relevant parameters (see [12] 
for further details). The result (1.1) on the diameter of the giant component 
concludes a long list of studies of this parameter in the supercritical random 
graph (e.g., [11,16,27,32]). First results for the challenging regime where 
e = o(l) appeared only recently: Riordan and Wormald [32] obtained very 
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accurate estimates of the diameter for most of this regime, but did not cover 
the range where the random graph emerges from the critical window (i.e., 
e^n tends to oo arbitrarily slowly). Luczak and Seierstad [27] then gave 
estimates for the diameter that do apply to the entire supercritical regime, 
yet their upper and lower bounds differ by a factor of 

Controlling typical and maximal distances between vertices in the giant 
component is but one of several prerequisites for estimating the mixing time 
of the (lazy) random walk on Ci. For instance, as this parameter is highly 
sensitive to bottlenecks in Ci, one also needs to fully understand the isoperi- 
metric profile of the 2-core and the structure of the trees attached to it. 

In the fully supercritical case, Fountoulakis and Reed [17] and Benjamini, 
Kozma and Wormald [6] independently proved that the mixing time on 
Ci is of order log^ n. However, as evident from the structure description in 
Theorem 2, methods for the fully-supercritical case that depend on large sets 
in the 2-core having edge expansion bounded away from will break down as 
e — > 0. Within the critical window, it was shown in [28] that the mixing time 
on Ci has order n. For e = o(l) outside the critical window, the problem of 
estimating the mixing-time on Ci remained open, and furthermore, it was 
unclear what the answer should be, as one would expect some interpolation 
between log^ n for fixed e > and order n at criticality. 

The following theorem, proved in a companion paper, settles this problem 
by exploiting the geometric understanding of Ci provided by Theorem 2. 
This completes the picture of the supercritical mixing time. 

Theorem 4 ([13]). Let C\ he the largest component of Q{n,p) forp = 
where e^n — > oo and e = o(l). With high probability, the mixing time of the 
lazy random walk on C\ is of order (1/e^) log^(e^n). 

Indeed, the mixing time exhibits a smooth evolution from the critical 
regime e = 0{n~^/^) to the fully-supercritical regime of e > fixed. 

1.2. Main techniques. A key ingredient in the proofs is the Poisson cloning 
model Gpc{n,p), introduced in [23] and shown to be contiguous to G{n,p) 
(see Section 2). It thus suffices to establish the contiguity of our model Ci to 
the giant component of Poisson cloning, a fact we establish in several stages. 

We first show the contiguity of the 2-cores in the models through a careful 
analysis of Qpc{n,p). We then perform a series of contiguous translations 
of the model, in order to remove dependencies between maximal 2-paths 
in the 2-core, as well as incorporate the trees attached to the 2-core in Ci. 
To establish these, we use local central limit results for various parameters, 
including a powerful local CLT of Pittel and Wormald [31]. 
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1.3. Organization. Section 2 contains several preliminary facts needed for 
the proofs. In Section 3 we reduce the 2-core of the Poisson cloning model to 
an intermediate simplified model (the proof of a technical lemma on Poisson 
cloning used here is postponed to Section 7). This model is subsequently 
reduced in Section 4 to one that is essentially the 2-core of our model Ci. 
The complete structure of the giant component is thereafter analyzed in 
Section 5, which concludes the proof of Theorem 2. In Section 6 we prove 
Theorem 1, addressing the special case of the early giant component. 

2. Preliminaries 

2.1. Cores and kernels. The k-core of a graph G, denoted by G^^\ is its 
maximum subgraph H C G where every vertex has degree at least k. It is 
well known (and easy to verify) that this subgraph is unique, and can be 
obtained by repeatedly deleting any vertex whose degree is smaller than k 
(at an arbitrary order). 

We call a path V = vo,vi, . . . , for A; > 1 (i.e., a sequence of vertices 
with ViVi^i an edge for each i) a 2-path if and only if Vi has degree 2 for all 
i = 1, . . . ,k — 1 (while the endpoints vo,Vk may have degree larger than 2, 
and possibly vq = Vk)- 

The kernel /C of G is obtained by taking its 2-core minus its disjoint 
cycles, then repeatedly contracting all 2-paths (replacing each by a single 
edge). Notice that, by definition, the degree of every vertex in fC is at least 
3. At certain times the notation ker(G) will be useful to denote a kernel 
with respect to some specific graph G. 

2.2. Configuration model. This model, introduced by Bollobas [8], pro- 
vides a remarkable method for constructing random graphs with a given 
degree distribution, which is highly useful to their analysis. We describe 
this for the case of random d-regular graphs for d fixed (the model is similar 
for other degree distributions); see [9,19,35] for additional information. 

Associate each of the n vertices with d distinct points (also referred to 
as "half-edges"), and consider a uniform perfect matching on these points. 
The random d-regular graph is obtained by contracting each cluster of the 
d points corresponding to a vertex, possibly introducing multiple edges and 
self-loops. Clearly, on the event that the obtained graph is simple, it is 
uniformly distributed among all d-regular graphs, and furthermore, one can 
show that this event occurs with probability bounded away from (namely, 
with probability about exp( )). Hence, every event that occurs w.h.p. 
for this model, also occurs w.h.p. for a random d-regular graph. 

One particularly useful property of the above model is that it allows one 
to construct the graph gradually, exposing the edges of the matching one by 
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one. This way, having exposed part of the graph, the edges on the remaining 
unmatched points are still distributed as a uniform perfect matching. 

2.3. Poisson cloning model. In order to analyze the delicate structure 
of the near-critical giant component, we need to use Poisson cloning model 
Qpc{n,p), which was introduced in [23]. We incorporate a brief account on 
Poisson cloning model as follows, and one can see [22] and [23] for more. 

Let V be the set of n vertices, and Po(A) denote a Poisson random variable 
with mean A. Let {dy}v£v be a sequence of i.i.d. Po(A) variables with 
A = (n — l)p. Then, take d{v) copies of each vertex v £ V and the copies of 

V are called clones of v or simply v-dones. Define Nx = Ylvev d{v). 

If A^A is even, the multi-graph Qpc{n,p) is obtained by generating a uni- 
form random perfect matching of those Nx clones (e.g., via the configuration 
model, where every clone is considered to be a half-edge) and contracting 
clones of the same vertex. That is to say, each matching of a u-clone and a 
i(j-clone is translated into the edge {vjw) with multiplicity. In the case that 

V = w, it contributes a self-loop with degree 2. On the other hand, if Nx 
is odd, we first pick a uniform clone and translate it to a special self-loop 
contributing degree 1 of the corresponding vertex. For the remaining clones, 
we generate a perfect matching and contract them as in the A'^^ even case. 

The following theorem of [23] states that the Poisson cloning model is 
contiguous with Erdos-Renyi model. Hence, it suffices to study Poisson 
cloning model in order to establish properties of Erdos-Renyi model. 

Theorem 2.1 ([23, Theorem 1.1]). Suppose p = Q{n~^). Then there exist 
constants ci, C2 > such that for any collection T of simple graphs, we have 

ciP(gpc(n,p) eJ')< F{g{n,p) ^ T) < C2 (P(gpc(n,p) G T)f'^+e-^) . 

Note that in our regime {p = for e = o(l) and e^n oo) we may 
replace the rate A = (n — l)p in the Poisson-cloning model definition simply 
by A = up, for convenience. 

3. The 2-core of Poisson cloning 

By the results of [23], the random graph Q{n,p) in our range of parameters 
is contiguous to the Poisson cloning model, where every vertex gets an i.i.d. 
Po(np) number of half-edges (clones), and the final (multi)graph is obtained 
thereafter via the configuration model. As opposed to Q(n,p), the Poisson 
cloning model features vertex degrees that are independently distributed, 
often contributing to an easier analysis. Nevertheless, the structure of the 
2-core in this model just beyond criticality is still highly nontrivial. 
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The main goal in this section is to reduce the 2-core of the supercritical 
Poisson cloning model to the following tractable model, which is simply a 
random graph uniformly chosen over all graphs with a given degree sequence. 

Definition 3.1 (Poisson-configuration model for n and p = ^-^)- 

(1) Let A ~ (l + e — ^, ^) and assign an independent variable Du ~ 
Po(A) to each vertex u. Let Nj^ = ^{u : Du = k} and N = ^;,>2 -^fc- 

(2) Construct a random graph on N vertices, uniformly chosen over all 
graphs with N/^ degree-k vertices for k >2 (if N is odd, choose a vertex 
u with Du = k > 2 with probability proportional to k, and give it k — 1 
half-edges and a self-loop). 

Theorem 3.2. Let G ~ Qpc{n,p) be generated by the Poisson cloning model 
for p = ii^, where e — > and e^n — > oo. Let G^^^ be its 2-core, and H be 
generated by the Poisson-configuration model corresponding to n,p. Then 
for any set of graphs A such that F{H £ A) ^ 0, we have P(G(^) £ A) ^ 0. 

In order to prove the above Theorem 3.2, in what follows we review a 
specific way to generate ^pc(ra,p), introduced in [23]. Let V be a set of n 
vertices and let 

X = np = 1 -\- e , 

be the mean of the degree. Consider n horizontal line segments ranging 
from (0,j) to (A,j), for j = 1, . . . ,n in M?. Assign a Poisson point process 
with rate 1 on each line segment independently. Each point {x, v) in these 
processes is referred to as a v-clone with the assigned number x. The entire 
set of Poisson point processes is called a Poisson A-cell. 

Given the Poisson A-cell, there are various schemes to generate a perfect 
matching on all points (thus yielding a random graph). One such way is 
the "Cut-Off Line Algorithm" (COLA), defined in [22], which is useful in 
finding the 2-core G^'^^ . We next describe this algorithm in detail. 

First define 9\ to be the unique positive solution to the following equation: 

e = l- e-^^ . (3.1) 
It is straightforward to verify that 

^A = (2 + o(l))e . (3.2) 
Let P be some real, to be specified later, satisfying 

i^</j<i^. (3.3) 
3-2 ^ ' 

Next, construct a Poisson A-cell as follows. The COLA procedure consists of 

multiple phases, formally defined in Algorithm 1 below. Throughout these 

phases, the algorithm maintains the position of a "cut-off line", a vertical 
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line in M? whose initial x-coordinate equals A, and gradually moves leftwards. 
The j-th phase {j > 1) begins when the line is at (1 — (3y~^X and ends once 
it reaches (1 — (3y X. 

The result of each phase is a matching on (previously unmatched) clones. 
In order to describe the rule of constructing this matching, we need the 
following definitions. At any given point, we call a vertex v € V (and its 
unmatched clones) light if it has at most one unmatched clone and heavy 
otherwise. Furthermore, for each j, we label each vertex (and its unmatched 
clones) at the beginning of phase j as either j-active or j-passive, as follows. 
A vertex v £ V (and its clones) is j-passive if it has precisely 2 clones to 
the left of the cut-off line, and both are unmatched. This partition of the 
unmatched clones into j-active and j-passive ones remains fixed throughout 
phase j. 

At the beginning of the process, all the light clones are placed in a stack 
(whose state is maintained without being re- initialized after each phase). 
The order by which these clones are inserted into the stack can be arbitrary, 
as long as it is oblivious of the values assigned to the clones. 

Algorithm 1 Cut-Off Line Algorithm: phase j description 

1. As long as the stack is nonempty, repeat the following: 

• Let {u, i) be the first clone in the stack. 

• Move the cut-off line leftwards until one of the following occurs: 

(a) If the line hits (1 — /?)■', the phase is conlcuded (quit). 

(b) The line hits an unmatched clone (w, j) 7^ {u,i). 

• Remove (u, i) from the stack, as well as (f, j) (if it is there). 

• Match {u,i) and {v,j), and re-evaluate u and v as light/heavy. 

• Add any clone that just became light into the stack. 

2. If there are active unmatched clones: 

(a) Choose such a clone uniformly at random and put it in the stack. 

(b) Return to Step (1). 

Otherwise, the algorithm is concluded (no additional phases). 



Define Ac to be the x-coordinate of the cut-off line once Step 2 is reached 
for the first time in the course of the algorithm, i.e., at the first time when 
there are no light clones. The next lemma states that Ac is concentrated 
about 6xX with a standard deviation of Xj^fOyji. 

Before giving the explicit statement on the concentration of Ac, we elab- 
orate on its important role in understanding the structure of the 2-core of 
the graph. Until reaching Step 2 for the first time, the above algorithm 
repeatedly matches light clones until all of them are exhausted — precisely 
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as the cut-off line readies Ac. As stated in Section 2, tlie k-core of a grapli 
can be obtained by repeatedly removing vertices of degree at most k — 1 
(at any arbitrary order). Therefore, the 2-core is precisely comprised of all 
the unmatched clones at the moment we reach Ac*. Crucially, continuing 
the algorithm will further reveal the inner structure of the 2-core, and these 
further steps are equivalent to running the configuration model on the clones 
to the left of A^. 

The following theorem gives tight concentration bounds for Ac*. Its proof 
relies on a delicate analysis of the above mentioned Algorithm 1, and we 
postpone it to Section 7. 

Theorem 3.3. [Upper bound on the window of Ac] There exist some con- 
stant c > so that for all ^ > with 7 = o(^^ 9^n^ , the following holds: 

P (lAc - ^aA| >^)< e-^^' . (3.4) 

3.1. Size of the 2-core and its disjoint cycles. Using the above theorem, 
we will now be able to characterize the structure of G^^^. Indeed, by the 
discussion preceding the theorem, the 2-core of Poisson-cloning given that 
Ac = i has the same distribution as the graph generated by the Poisson- 
configuration model given that A = i. The above theorem implies that 
w.h.p. we only need to consider 

e = {l + o{l))6xX = {2 + o{l))e, 

and our next step is to estimate the basic properties of the 2-core (size, the 
number of vertices that comprise disjoint cycles) and its kernel on this event. 

The next proposition thus applies not only to the Poisson-configuration 
model but also to the 2-core of Poisson-cloning. The term expander used 
here refers (informally) to a graph where the ratio between the boundary 
and volume of each set is bounded from below by some constant c > (a 
precise definition appears below). 

Proposition 3.4. Let H be generated by the Poisson-configuration model 
given A = (., where £ = (2 + o(l))e. Define H' as the graph obtained by 
deleting every disjoint cycle from H . Let N2 be the number of vertices with 
degree 2 in H , and N2 be the corresponding quantity for H' . Then w.h.p. 

N2 = {2 + o{l))e^n, N'^ = {I + o{l))N2 . 

In addition, w.h.p. the kernel /C of H is an expander graph with 

|/C| = (I + 0(1)) e^n , \E{iq\ = (2 + o{l))e^n . 

The first step in the proof is to establish the size of the kernel /C = ker(if), 
as well as show that it is an expander. This latter fact is of independent 
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interest and will have important applications, e.g., for the mixing time of 
the random walk on Ci. In what follows, for a subset S of the vertices of a 
graph G, we let 

denote the sum of the degrees of its vertices (also referred to as the volume 
of S in G). Further define the isoperimetric number of a graph G, denoted 
by i{G), as 

i(G)^min|^||^ : S C V{G) , d^S) < \E{G)\^ . 

We say that G is a c- edge- expander for some fixed c > iff i{G) > c. 

Lemma 3.5. For IC the kernel of H as defined in Proposition 3.4, w.h.p. 

|/C| = (I + 0(1)) e^n , \E{1C)\ = (2 + o{l))E^n , 

and IC is an a -edge- expander for some constant a > 0. 

Proof. By Definition 3.1, the kernel consists of exactly those vertices u € V 
that have Du > 3. Combining this with the assumption that A = (2+o(l))e, 
it follows that |/C| ~ Bm{n,p^ (A)), where 

PtW = E = -"^(1 + OW)^ = (I + 0(1)) . 

k>3 

Since e^n — > oo, we get that w.h.p. 

m = {l + o{l))e'n. (3.5) 

Similarly, the total sum of degrees in IC is simply "^^^j^ D^, and therefore, 
|£'(/C)| is the sum of n i.i.d. variables distributed as y ~ Po(A)l[3 o^-). A 
similar calculation to the one above now gives that KY = (4 + o(l))e^, and 
so (by CLT) w.h.p. 

\E{IC)\ = (2 + o{l))e\ . (3.6) 

To show that the isoperimetric number is bounded away from 0, we apply 
standard techniques used to analyze the configuration model (for definitions, 
see Subsection 2.2), while assuming (3.5) and (3.6). Let < a < | be 
specified later, and let 

D = 2\E{IC)\ = {4 + o{l))e^n 

be the total number of points to be matched in the configuration model. We 
will next prove a lower bound i{IC) > a > for the case where D is even, 
and under the relaxed condition that perhaps one of the vertices of /C has 
degree 2 (all others have degree 3 or more). 
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To see that this gives a bound on i{IC) for D odd, recall that in that 
case precisely one of the D points will have a self-loop (by Definition 3.1). 
Clearly, omitting one point produces a kernel as handled above (with an 
isoperimetric number at least a), and reintroducing it (to a vertex with at 
least 2 other points) would give i(/C) > |a. 

Consider the probability that {e{S,S'^) < ad!c{S)}, where S is a subset 
of /C with d!c{S) = s. This is precisely the probability that k < as points 
out of the s that comprise S are matched with points in S'^, whereas the 
remaining s — k points form a perfect matching. Thus, 

as 

¥{e{S,S') < as) = j;P(e(5,5^) = k)^,^^ (mod 2)} 

fc=0 

smi:(:)(-*)"C';>'(«-»-^-'"- 

fc=0 ^ ^ ^ ^ 

where we used the fact that (^^{y] < '^\frn for sufficiently large m, and that 



^J{D - s- k){s - k) > - 2k > ^/D/2, as k < as < s/4 and s < D/2. 

A standard application of Stirling's formula gives nil = 0((ri!)^/^n"^/^). 
Hence, for some constant ci > 0, 

ne(S S^) < as) < V " ^ - ^)^^^ sl{s - k)^^ {D - s)! 

' ^ - - (Z)!)i/2Di/4 kl{{s - fc)!)i/2 {{D-s- /fc)!)i/2 

iD-s-k)is-k)Y/Uil){^^^)Y/' 



k=0 
as 

k=0 



D 

as / (s\(D^s\s.^l/2 
fc=0 ^ \s) ^ 



(?) 



It is well known that, by Stirling's formula, (^) x y fc(;7Zfc) exp [— f/'(-)n], 

where il{x) is the entropy function H.{x) = — xlogx — (1 — rr) log(l — x). 
Thus, for some constant C2 > 0, 

as 

¥{e{S,S^) < as) < eM^(l)^+^(i^)(^-)"^(*)^] 

where we applied the fact that < . Recalling that each vertex of /C 
has degree at least 3, except for possibly one vertex of degree 2, we have 

s = diciS) > 3\S\ - 1 , D>3\JC\-1, 
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and it follows that IS"! < Thus, 



P(e(5, 5^=) < as) < ^ Yl ne{S,S')<as) 



S:dK.{S)=s 1<S±1 S:dK:{S)=s 

\S\=l 

Since D = (3+o(l))|/C|, any s < L'/2 satisfies s/3 < (^+o(l))|/C|. Therefore, 
another application of the above estimate of the binomial coefficient gives 
that for some constant C3 > 0, 

Y IP(e(5,5^) < as) < c3,5/2^H(^+o(i))|^|+i[H(.).+H(ag.)D-H(^)D] 

S:dK.{S)=s 

^ ^^^5/2g(i+o(l))H(-§)D+i[H(a)-§+//(2g£)_//(-5y)]D _ 

It is then clear that we can choose a sufficiently small a > such that 

P(e(S,S^) < as) < C3s5/V^^(*)^ < cgs^/V^ . 

Combined with the fact that D = (4 + o(l))e^n — > 00 by (3.5), and summing 
over the possible values of s, we deduce that 

F{3S C /C, diciS) < \E{}C)\, e{S,S') < ad^{S)) = o(l) , 

as required. ■ 

Proof of Proposition 3.4. The required statement on the typical number 
of vertices and edges in the kernel, |/C| and |£'(/C)| resp., has already been 
established in Lemma 3.5 (along with the expansion properties of the kernel). 
It remains to show that, w.h.p., and N2 are both (2 + o(l))e^n, where 
N2 is the number of degree-2 vertices in the 2-core, and A'^^ equals minus 
the number of vertices that belong to disjoint cycles in the 2-core. 

The main issue left is to distinguish between H and H' (the graph before 
and after removing its disjoint cycles). 

Let P2{x) be the probability that a Po(x) variable equals 2: 

P2{x) =e"'^y . 
By Definition 3.1 and our assumption that A = (2 + o(l))e, 

N2 ~ Bin(?i,p2(A)) , and P2(Ac) = (2 + o{l))e^ . 

Hence, by a standard concentration argument (using the fact that e^n 00) 
we deduce that N2 = (2 + o(l))e^n w.h.p. Assume therefore that this is 
the case (i.e., there are (2 + o(l))e^n vertices of degree 2 in H), and that 
\E{1C)\ = {2 + o{l))e^n. 
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We next consider the disjoint cycles in H. For a given degree-2 vertex 
V in if, let Ay^k denote the event that v belongs to such a disjoint cycle 
whose length is fc, and further let = U^j^A^^^. To form a disjoint cycle 
of length k via the configuration model, we must repeatedly match points 
of degree-2 vertices, that is (noticing that 2\E{1C)\ counts the total number 
of points to be matched via the configuration model): 

1 27V2-2j-l 
^ ""'^^ ~ 2N2 + 2\E{1C)\ - 2k + 1 2N2 + 2\E{iq\ - 2j - 1 

1 yr 2N2-2j-l 

2N2 + 2\E{1C)\ - 1 j}^ 2N2 + 2\E{1C)\ - 2j - 3 

<^(l-(l + o(l)).)-\ 

since the terms in the above product over j formed a decreasing sequence. 
Summing over the values of k, 

k=l k=l 

<i±|W.i±^ = i±i<ll=„(l), (3.7) 
It then follows that w.h.p. N2 = (1— o(l))A'^2 = (2+o(l))e^n, as required. ■ 

3.2. Contiguity of Poisson-cloning and Poisson-configuration. A key 

part of showing the contiguity result is a counterpart for Theorem 3.3, which 
together implies that Ac has a tight concentration window of order 1 / ^/£n. 

Theorem 3.6. [Lower hound on the window of Ac] There exist some con- 
stant c > such that for any t = t{n) > and fixed 6 > 

t<Ac<t + -4=] < c6 . (3.8) 

en / 



°(|Ac- A^aI > 



< 5 . 



Proof. The results of the previous subsection imply that, for some suitably 
large constant M = M{6) > 0, 

M 

It follows that (3.8) holds trivially for any c > 1 when \t — X9x\ > 
Therefore, we assume in what follows \t — X6\\ < Mii. Denote by A the 
event {t < Ac < t + 5/^/en]. Recall the fact that |G(2)| ~ Bin(n,p^(Ac)), 
where G^^^ is the 2-core of the Poisson-cloning model and P2{x) stands for 
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the probability for a Po(x) variable to be at least 2. Standard analytical 
arguments give that 

p+(t + ^=)-pt{t)< 



Now, an application of CLT implies that for some interval B of length 45y^en, 

G 5U) = l-o(l) . (3.9) 



(2) 

Consider C\ , the 2-core of the giant component in the Poisson-cloning. 
Recalling (3.7) (which, as the discussion before Proposition 3.4, also applies 
to the Poisson-cloning model), we know that w.h.p. only an fraction of 
vertices in win appear in disjoint cycles, where w = (e^n)^/^ (— > oo). 
Therefore, we have 

ICPI = (l -^)= \G^'^\ - 0(f ) = - o(V^) . 

Together with (3.9), we conclude that there exists an interval B' with length 
56y/en such that 

p(|Cf^| G S'l a) = l-o(l) . (3.10) 

Now, for the 2-core H of the giant component of the Erdos-Renyi graph 
G{_n,p), it is known (see [31, Theorem 6], reformulated here in Theorem 5.1) 
that \H\ is in the limit Gaussian with variance (12 -|- o{l))en. Therefore, 

F{\H\ G B') < 5(5 . 

Combining this with contiguity of Poisson-cloning and Q{n,p) (as stated in 
Theorem 2.1), we obtain that for some constant cq > 

P(|cf^| G B') < 5co5 . 
The proof is completed by choosing c = 5co + 1 and applying (3.10). ■ 

Using the above estimate for Ac, we are now able to conclude the main 
result of this section, which reduces the 2-core of Poisson-cloning to the 
graph generated by the Poisson-configuration model. 

Proof of Theorem 3.2. Recall that H is the random graph generated by 
the Poisson-configuration model, and G^'^^ is the 2-core of Poisson-cloning. 
Let (5 > 0, and set 

B = {XOx - XOx + My/m) , 

where M = M{5) is a sufficiently large constant such that P(Ac G i?) > 1—5. 
Further define 

D = {x: F{H £A\A = x)>6}. 
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Since F{H £ A) = o(l), we obtain that P(A £ D) ^ 0, and consequently 

F{A£BnD) = oil) . 

Recalling that A ~ M {X6x,l/{£n)), we deduce that C{B n D)^/en 0, 
where £(•) stands for the Lebesgue measure on M. At this point, Theorem 3.6 
gives that P(Ac € -B n Z?) — > 0. Recalling that G^^^ and H are generated 
by the same scheme (and hence have the same distribution) given the event 
Ac = A, we obtain that 

P(G(2) £A)< 25 + o{l) , 
as required. ■ 



4. Constructing the 2-core of the random graph 

In the previous section, we have shown that the 2-core of Poisson-cloning 
is contiguous to a simpler model, which we called the Poisson-configuration 
model (see Definition 3.1). The goal of this section is to reduce the Poisson- 
configuration model to the following, where here and in what follows, // is 
defined to be the conjugate of A = 1 -|- e. That is to say, ^ <1 and 

^e-t' = Ae"^ . (4.1) 

Definition 4.1 (Poisson-geometric model for n and p = ^^). 

(1) Let A~AA(l-|-e — ^, ^) and assign an independent Po(A) variable Du 
to each vertex u. Let = #{u : = k} and N = X^;j>3 N^. 

(2) Construct a random graph fC on N vertices, uniformly chosen over all 
graphs with degree-k vertices for k >3 (ifYlk>3 is odd, choose 
a vertex u with = k > 3 with probability proportional to k, and give 
it k — 1 half-edges and a self-loop). 

(3) Replace the edges of IC by paths of length i.i.d. Geom(l — /_f). 

Theorem 4.2. Let H be generated by the Poisson-configuration model w.r.t. 
n and p = where e — > and e^n oo. Let H be generated by the 

Poisson-geometric model corresponding to n, p. Then for any set of graphs 
A such that F{H £ A) ^ 0, we have F{H £A)^0. 

Clearly, both models have the same kernel, and they only differ in the 
way this kernel is thereafter expanded to form the entire graph (replacing 
edges by paths). To prove the above statement, we need to estimate the 
distribution of the total number of edges in each of the models; we will show 
that they are in fact contiguous. 
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4.1. Edge distribution in the Poisson-configuration model. 



Lemma 4.3. Let denote the number of degree-k vertices in the Poisson- 
configuration model, and set Ag = X — ^. For any fixed M > there exist 
some ci,C2 > such that the following holds: If n^^n^, . . . satisfy 



n 1 



-^"(l + Ao + ?)j-E.>3^fc 
nAo (1 - e-A«(l + Ao)) - Zk>3 



< MVehi, 



and X satisfies |x — Ao| < 



M 



then 



ci < 



F {Nk = nfc for all k>3\A = x) 
P [Nk = nk for aUk>3\A = Aq) 



<C2. 



Proof. Throughout the proof of the lemma, the implicit constants in the 
O(-) notation depend on M. 

Write m = X]fc>3 ^fc ''' — ^k>3 ^'"^k-, and let A = ^(ns, n4, . . .) denote 
the event {Nk = for all /c > 3}. As usual, we use the abbreviations 

^''x^/kl , and = P(Po(x) < k) . 



Pk{x) = ¥{Po{x) 

It follows that 

F{A\A = x) 



I A = Ao 




n 



Pkjx) 
Pfc(Ao) 



l + Ao + ^ 



X 

To 



and so 



log' 



{A\A = x) 



M A = A 







n(Ao — x) + {n — m) log 



l + x+^ \ ^ X 

^ +rlog— . 

l + Ao + ^/ ^0 



Using Taylor's expansion and recalling that x — Aq = 0(l/-y/en) = o(Ao), 

.2 



log 



1 + Ao + ^ 



(x-Ao)-(Ao-o(l))(x 

l + Ao + ^ 

1 + Ao 
l + Ao + ^ 



Ao)^ 



(x-Ao)+0(l/n), 



and we deduce that 
-{A\A = x) 



log: 



I A = Ao) 



n(Ao — x) + {n — m) 



1 + A 







1 + Ao + ^ 



(a;-Ao)-0(l) 



+ r- 



An 



An 



O [r/e^n) . 
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Our assumptions on m,r now yield that 



log 



P (A I A = Ao) 



n(Ao -x)+ ne-^o(l + Ao)(x - Aq 



+ n (1 - e"^°(l + Ao)) (x - Aq) + 0(1) = 0(1) , 

completing the proof. 

Fix M > 0, and let Bm denote the following set of "good" kernels: 



B 



M 



K : 



/CI - n( 1-6^^0(1+ An + ^ 



|S(/C)| -inAo(l-e-^°(l + Ao)) 



< MVe% 



(4.2) 



Let /a(- I •) denote the density function of A given the kernel /C (or equiv- 
alently, given its degree sequence). By applying Bayes' formula, the above 
lemma gives that 

/a(^I^) ^e(i) 
/a(Ao I /c) ^ 

for all /C G Bm and x in the interval Im = [Aq — Aq + -^=]- Clearly, by 
volume considerations, this implies that for some c = c(M) > we have 

/a(x I /C) < c^fen for all x G 1m and /C G (4.3) 

Lemma 4.4. Define M > 0, and as above. Let H be generated by 
the Poisson-configuration model. There exists some constant c = c(M) > 
so that for any K. & Bm o,nd s with |s — f (Aq — e~'^"Ao) | < M^/en, 

P(|^(i/)| =s, Ag/a/| ker(i7) =/C) < ^ 



en 

Proof. Let x G Im and /C G Bjv/, and write m = \lC\ and r = |S(/C)| for the 
number of vertices and the edges in the kernel respectively. We will first 
estimate ¥{\E{H)\ = s \ K = x , keT{H) = /C), and the required inequality 
will then readily follow from an integration over x G Im- 

Note that, given A = x and ker{H) = /C, the number of edges in H is the 
r edges of IC plus an added edge for each degree 2 variable out of the n — m 
variables (i.i.d. Po(x)) that have {u : < 2}. That is, in this case 

xV2 



\E{H)\ ~ r + Bin n 



1 + 2; + x2/2 



and therefore, 

W{\E{H)\ 



s I A = X , ker(F) = /C) 



n — m 
s — r 



^2/2 



l + x + x'^/2 



1 + x 
l + x + x'^/2 



n—m—{s—r) 
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Write 



and define 



x72 



Ag/2 
1 + Ao + 



and 



n — m 



Since x £ lu and K, G Bu^ we have 

9 = 90 + 0{\/TJn) , and i = go + 0(-\/ e/n) . 
Using Stirling's formula, we obtain that 
P (|^(//)| = s I A = x , ker(i/) = /C) 



(4.4) 



< 



< 



(1 + o(l))(n - m) 
27r(s — r) (n — m — (s — r)) 



1/2 



1 - t 



n— m— s+r 



1 



^{n-m)gt{x) 



\/s — r 
where gt{q) is given by 

gt{q) = -tlogt - {1 - t) log(l -t)+tlogq+{l-t) log(l - g) . 

It is easy to verify that 

9tiQ) 



l-t t 
1-q q 



and so for any q, t satisfying (4.4) we have 
gt{t) = 0, g'tit) = 0, 

9t{q) = -- 



i-t 
0^ 



qo 



Thus, for any large n (absorbing the o(l)-term in the constant) we have 



gt{q) < -i^{q-tf 



(4.5) 



Clearly, the function q{x) 



x{2+x) , 

zToTT; ana 



satisfies q'{x) - 2(i+x+x^/2) 
in particular q is strictly monotone increasing from to 1 for x £ [0, oo). 
Thus, there exists a unique xt > such that q{xt) = t. Noticing that for all 
X = {1 + o(l))Ao we have q'{x) = (1 + o(l))Ao = (2 + o(l))e, it follows that 
for any Mi > one can choose M2 > such that 



Aq - Ml y/ejn, Aq + Mi y/ejn 



C 



A, 







M2 A I _M2_ 



Ah 
en 
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and in particular, = Aq + 0{1/ y/en). We can now apply the Mean Value 
Theorem to q in (4.5) and obtain that 

gt{q) < -i^{q{x) - q{xt)f = _ iiilL^lilMilL (a; _ xtf < -\{x - xtf , 

where the last inequality holds for any sufficiently large n. Altogether, 
absorbing the change from {n — m) to n in the constant, we conclude that 

^{\E{H)\ = s I A = X, ker(/7) = /C) 

1 1 

< p-^n(a:-xt)^ ^ ^-\n{x-xtf _ (^^g-j 

~ y/s -r ~ yfehl 

It remains to integrate the above conditional probability over x G 1m- 
Combining (4.3) and (4.6), we obtain that for some constant c > 

P {\E{B)\ = s , A G /a/ I ker(if) = /C) 

< / P (l^(-H')l = s I A = X , ker(F) = /C) fA{x\IC) dx 
Jim 

as required. ■ 

4.2. Edge distribution in the Poisson-geometric model. 

Lemma 4.5. Let M > and Bm be as in (4.2). Let H be generated by the 
Poisson-geometric model. There exists some constant c = c{M) > so that 
for any JC G Bm o-nd s with |s — ^(A — ;u) (l — ^) | < M^fen, 



roo 



¥(\E{H)\ =s \ ker(^) = /c) > 



c 



en 



Proof. By definition, given that ker(i?) = /C, the variable \E{H) is the sum 
of |i?(/C)| i.i.d. geometric variables with mean 1/(1 — //). 

Denote by r the number of edges in the kernel /C, and let s be a candidate 
for the number of edges in the expanded 2-core H. As stated in the lemma 
(recall definition (4.2)), we are interested in the following range for r, s: 
n 



2 



(A - /.)(! - ^)(1 - ^) + cWe^n, (|ci| < M) , 



Ti 

s = -{X-^Ji){l-i|^)+C2^/^, {\C2\<M). 
In this case, we have that 

s-l ^ 1 ^ C2^fen - ci V£3^/(l - pt) - 1 ^ 1 + ^ ,^ ^^ 
r-\~\-ii |(A-/x)(l-^)(l-^) + ciV?^-l ~ 1-^ ' 
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where S = ^+°(^) Lg^ be independent geometric random variables 

1 



with mean i-e., P(Xj = k) = fi^ ^{1 — /i) for A; = 1,2, . . .; further set 



Sk = Yli=i{^i ~ !)■ Since Sk follows a negative binomial distribution, 

P(5, = s-r)=(^""|)(l-^*)V-'-. 
Using Stirling's formula, we get that for some constant C3 > 

Substituting (4.7) in the above, and using the fact that 

(i + e)/(i-/^) ^1 + ^ 

s-r + ^ + ^^' 

we obtain that 



C3(l -/^) 



■ exp 
where 



^(x) = (1 + x) log(l + x) + {x + fi) log 



Clearly, we have that g{0) = and a standard calculation yields that 
g'{x) = log(l + x) + log (-^) and ^ " ^ 



^x + fiJ (x + ^)(l + x) 

In particular, we have that ^'(O) = and |5'"(x)| < 2(1 — n) when |x| < |^|, 
where ^ is defined as above (recall that = o(l)). Therefore, for all such x 
we have \g{x)\ < 2(1 — /i)x^, and altogether, 

r{Sr = s-r)> S^Oi^e-^ir^mi' . 

Since 

r|el^ = ^^^(c2-Cl)2<(2 + o(l))M^ 



we deduce that for some C3 > 0, 

P(5,. = s-r)> cy^/en, 

completing the proof. 
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4.3. Contiguity of the two models. We are now ready to prove the main 
result of this section, Theorem 4.2, which reduces the Poisson-configuration 
model to the Poisson-geometric model. 

Proof. For some constant M > to be specified later, define the event 

Am = { A G hi, ker{H) G Bm, \\E{H)\ - f (A - ;u)(l - ^)| < M,/^ } . 

Fix 5 > 0. We claim that for a sufficiently large M = M{6) we have 
P(ylj\/) > 1 — 6. To see this, note the following: 

1. In the Poisson-configuration model, A ~ (Aq, and Im includes 
at least M standard deviations about its mean. 

2. Each of the variables |/C| and E{IC) is a sum of i.i.d. random variables 
with variance 0{e^n) and mean as specified in the definition of Bm, 
hence their concentration follows from CLT. 

3. Finally, E[H) is again a sum of i.i.d. variables and has variance 0(en), 
only here we must subtract the vertices that comprise disjoint cycles. 
By (3.7) and the estimate in Proposition 3.4 on the size of the 2-core in 
the Poisson-configuration model, the number of such vertices is 0(l/e) 
w.h.p. Compared to the standard deviation of 0(-y/en), this amounts 
to a negligible error, as e^n oo. 

Given an integer s and a kernel /C, let Ps,a: denote every possible 2-core with 
s edges and kernel /C. Crucially, the distribution of the Poisson-configuration 
model given E[H) = s and kei{H) = /C is uniform over T>s^ic, and so is the 
Poisson-geometric model given E[H) = s and ker(i/) = /C. Therefore, for 
any graph D £ Vs^k, 



F{H = D 


ker{H) 


= JC) 


F{\E{H)\ = s kei{H) 


= /C) 


F{H = D 


ker{H) 


= JC) 


F{\E{H)\ = s \ ker(^) 


= /C) 



Combining Lemmas 4.4,4.5 we get that for some c = c(M) > 0, 



¥{\E{H)\=s , Am\ 


keT{H) = K) 


¥{\E{H)\ = s 


ker{H) = fC) 



Recalling that P(j4a/) > \ — 5 and letting 5 — > 0, we deduce that for any 
family of graphs A, if P(-H' G ^) ^ then also ¥{H G ^) ^ 0. ■ 

5. Constructing the giant component 

Throughout the section, let p = (1 + e)/n, where e — > and e^n oo 
with n, and let G be a random graph G ~ Q{n,p). We begin by analyzing 
the "bushes", i.e., the trees that are attached to G^'^\ the 2-core of G. 
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As before, ;U < 1 is defined to be the conjugate of A = 1 + e (see (4.1)). 
Since e — > 0, we can infer from a standard Taylor expansion that 

H = l-E + le^ + 0{e^) . (5.1) 

Proof of Theorem 2. In what fohows, we use the abbreviation PGW(/z)- 
tree for a Poisson(;u)-Galton- Watson tree. Let Ci denote the graph obtained 
as follows: 

• Let H he a copy of C\ (the 2-core of the giant component of G). 

• For each v £ H, attach an independent PGW(//) tree rooted at v. 

By this definition, Ci and Ci have exactly the same 2-core H. For simplicity, 
we will refer directly to H as the 2-core of the model, whenever the context 
of either Ci or Ci is clear. We first establish the contiguity of Ci and Ci. 
Define the bushes of Ci as follows: 

Tu = {v £ Ci : V is connected to u in Ci \ H} for u £ H . 

Clearly, each Tu is a tree as it is connected and does not contain any cycles 
(its vertices were not included in the 2-core). To conclude, we go from H to 
Ci by attaching a tree to each vertex u £ H (while identifying the root 
of Tu with u). Analogously, let {Tu}ueH be the corresponding bushes in Ci. 

We next introduce notations for the labeled and unlabeled trees as well 
as their distributions. For t £ N, let TZt be the set of all labeled rooted trees 
on the vertex set [t] , and let Ut be chosen uniformly at random from TZt ■ For 
T £ TZt and a bijection cp on [t], let (piT) be the tree obtained by relabeling 
the vertices in T according to (p. Furthermore, define 

T' = {(p{T) : (j) is a bijection on [t]} 

to be the corresponding rooted unlabeled tree. 

Let {tu : u £ H} be some integers. Conditioning on the event 

{\Tu\ = tu for a\\ u £ H} , 

we know from the definition of Q{n,p) that is distributed independently 
and uniformly among all labeled trees of size tu rooted at u. In particu- 
lar, in that case each is independently distributed as C//^ (the unlabeled 
counterparts of T„ and Ut^ ) ■ 

On the other hand, Aldous [2] (see also, e.g., [3]) observed that, if T is a 
PGW-tree then T' has the same distribution as on the event {|T| = t}. 
Therefore, conditioning on the event 

{ \Tu\ = tu for all n G -f/"} , 

we also get that has the same distribution as U't^ . 
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We therefore turn to study the sizes of the bushes in Ci and Ci . Letting 
{tu ■ u G H} be some integers and writing 



N 



we claim that by definition of Q{n,p) every extension of the 2-core H to the 
component Ci, using trees whose sizes sum up to N, has the same probabihty. 
To see this, fix H, and notice that the probabihty of obtaining a component 
with a 2-core is H and an extension X connecting it to A'^ — \H\ additional 
vertices only depends on the number of edges in H and X (and the fact that 
this is a legal configuration, i.e., H is a valid 2-core and X is comprised of 
trees). Therefore, upon conditioning on H the probabilities of the various 
extensions X remain all equal. Cayley's formula gives that there are m™~^ 
labeled rooted trees on m vertices, and so, 



\Tu\ = tu for all u e H\H) 

= P (|Ci| = iV I if) P (|r„| = tu for all u £ H\H 
1 Nl 



N] 



(|Ci 



N H] 



N\H) 



(5.2) 



where Z(N) and Z'{N) are the following normalizing constants 



z'{N) = E n 

Z{N) = Z'{N)^l^-\^\e-^'^ 



rr 



■(/ie^^')"" 



Notice that the size of a Poisson(7)-Galton- Watson tree T follows a Borel(7) 
distribution (see, e.g., [30]), namely. 



p(|r| = t) 



7t! 



■(7e 



-■y\t 



(5.3) 



Recalling that T„ are independent PGW(/i)-trees, it follows that 



Z\N) 



E [ n 



Combining this with (5.2) and (5.3), we obtain that 



N H 



n\Tu\ 


= tu for all u 


G H 


H) 


_ n\ci\ 


= 


H) 


n\Tu\ 


= tu for all u 


G H 


H) 


P(|Ci| 


= N 





(5.4) 
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At this point, we wish to estimate the ratio in the right hand side above. To 
this end, we need the following result of [31], which we restate in our setting 
of the near-critical regime. 

/bi{A)\ 

Theorem 5.1 ([31, Theorem 6], reformulated). Leth{X) = fe{A) \ where 

6i(A) = (1-m)(1-^) , 62(A) = /.(I- ^) , 63(A) = i(l-^) (A + ^^-2). 
There exist positive definite matrices Kp,Km satisfying 

:i2 + o(l))e 4 + 0(1) (6 + o(l))e2 

Kp=\ 4 + 0(1) (2 + o(l))/e (2 + o(l))e 

10 
3 



(6 + o(l))e2 (2 + o(l))e (^ + o(l))e3 



K^ = Kp- 2A^ . , det(K^) = (I + o{l))e' , 

and such that 

(i) (|-H'|,|Ci| — \H\,\E{H)\ — \H\) is in the limit Gaussian with a mean 

vector nh and a covariance matrix uKp. 
(a) If Am = and B denotes the event that |ii^(G)| = m for some 

m = (l + (l + o(l))e)^, and there is a unique component of size between 

en and Aen and none larger, then 



H\ = m, |Ci| - \H\ = n2, \E{H)\ - \H\ =n^\B) 

exp (-ix'^A^x) , (5.5) 



\/3 + o(l) , 1 T 



8(7rne)3/2 

uniformly for all (ni,n2,n3) G such that 

{Kp{l, l)"i/2xi, Kp(2, 2)"^/2x2, i^p(3, 3)"i/\3) 
is bounded, where X"^ = (xi,rc2,X3) is defined by 

— ^ -.{ni- bin, 71,2 - 6272, n3 - 63n) . 



jn 

Since e^n —> 00, it is clear by CLT that w.h.p. the total number of edges 
in G ~ Q{n,p) is (1 + (1 + o(l))e)^. Furthermore, by the results of [10] and 
[24] (see also [19]), w.h.p. our graph G has a unique giant component of size 

(1 + o(l))(l - fji/\)n = (2 + o(l))en . 

Altogether, we deduce that the event B happens w.h.p.; assume therefore 
that B indeed occurs. Define the event Q by 

Qm = |(ni,n2,n3) G : |xi| < , \x2\ < ^ , |rE3| < e^^Hl^ , 

Q = {m, \Ci\ - \H\,\E{H)\ - \H\) G Qm} ■ 
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By part (i) of Theorem 5.1, for any fixed 6 > there exists some M > 
such that P((5'^) < S for a sufficiently large n. Next, define 

P^ax= max F {\H\ = m, \Ci\ - \H\ = 112, \E{H)\ - \H\ = ns) , 

(ni,n2,n3)6Qj\/ 

Prain = min P {\H\ = m, |Ci| - \H\ = n2, \E{H)\ - \H\ = ng) . 

It follows from part (ii) of Theorem 5.1 that there exists some c = c(M) > 
such that 

-fmax ^ C • i^niin j (^•6) 

when n is sufficiently large. Notice that by definition of x, 

#{n2 G N : |x2| < M/^} > M^/^e . 
Combined with (5.6), it follows that for any (ni,n2,'re3) G Qm we have 

P(|Ci| =ni + n2 , g| |//| =ni) < (5.7) 



With this estimate for lP(|Ci| = N | H), the numerator in the right-hand-side 
of (5.4), it remains to estimate the denominator, P(|Ci| = N | H). 

Recall that, given H, the quantity \C\\ is a sum of \H\ i.i.d. Borel(;u) 
random variables (each such variable is the size of a PGW(/i)-tree). We 
would now like to derive a local central limit theorem for |Ci|. Unfortunately, 
each Borel(/i) variable |T„| has Var \Ty\ x 1/e^ oo, and standard versions 
of local CLT do not apply here. To bypass this obstacle, we use a different 
characterization of the tree-sizes {|T„| : u € H}. 

It is well known that the total progeny in a branching process with off- 
spring distribution Z has the same law as the hitting time from 1 to of 
a one-dimensional random walk whose increments are i.i.d. variables dis- 
tributed as Z — 1 (see, e.g., [34, page 234]). Hence, the total size of k i.i.d. 
such branching processes is exactly the hitting time of this walk from k to 
0. The following theorem of Otter [29] characterizes this quantity: 

Theorem 5.2 ([29], see also [20]). Let Wt he a random walk, whose steps 
Yi are i.i.d. random variables satisfying Yi > —1. Then 

IPfc(ro = t) = ^FkiWt = 0) . 

In our setting, Yi ~ Po(/i) - 1, hence Pfc(Wt = 0) = F{St = t - k) where 
St = X]i=i and the XiS are i.i.d. Po(/i) variables. In light of the above 
theorem, it follows that for any integer n2, 

= Til 712 \H\ = ni) = ■ F{Sn^+n2 = '^2) ■ (5.8) 

' ni + n2 
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Since Sm ~ Po(m/i) for any m, we have that 

-fa+n,)^ ((ni +n2)/i) 
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n2 



n2\ 



(5.9) 



Recalhng the definition of Qm, we are interested in the following range for 
ni and 



ni = (1 - ;u)(l — + ci^/en 
722 = /u(l - + C2\/n/e 
In this case, we have 
ni + 712 



(|ci| < M) , 
(|C2| < M) . 



I - + ci^/en + C2^/ri/e 1 ciy/en + C2{1 - j^) ^/n/e 



n2 



- + C2-\/n/e 



+ 



- + C2^/n/e 



1 



where ^ = ^(n) = + o(l))(ci — C2)/i/en- Applying Stirling's formula to 
(5.9) and using the fact that 1 + x > exp(x — x^) for x > gives 



P(S'„i+„2 = 722) = exp 



. 721+722 
1 722 



772 



>/27r722 



(721 + "-2)/^ 



ri2 



712 



^/2^^Tl2 



V2vr722 



= exp (-^/i772) 

Now, since 722 = (2 + o(l))en and 

^2^2^2 = (1 + o(l))^^^^^2eri < (2 + o{l))M^ , 

we conclude that for some constant 6' = 6'{M), 

6' 



ni+?i2 



n2) > 



ne 



Recalling that 



ni+n2 



(1 + o(l))e, we can decrease 6' to absorb this o(l) 



error-term for a sufficiently large ri, and together with (5.8) get 
i| = rii + 712 l-f^l = "-i) > 



/ne ^Jn/e 

Combining (5.7) and (5.10), we obtain that when n is sufficiently large 

c 



(5.10) 



¥{\Ci\=N , Q 


1^1) 


P(|Ci| = 


1^0 





< 



M5' 



By (5.4) (and recalling the fact that conditioned on |Tj|, the tree Tj is uni- 
formly distributed among all unlabeled trees of this size, and a similar state- 
ment holds for Tj), we conclude that for some c' = c'{M) > and any 
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unlabeled graph A 

P(Ci =A,Q,B\H)<c P(Ci = A\H). (5.11) 

We are now ready to conclude the proof of the main theorem. Let Ci be 
defined as in Theorem 2. For any set of simple graphs A, define 

n= {if :P(Ci eA,Q,B\ Cf^ =H)> (P(Ci G ^))^/^} . (5.12) 

Recah that by definition, Ci is produced by first constructing its 2-core 
(first two steps of the description), then attaching to each of its vertices 
independent PGW(//)-trees. Hence, for any H, the graphs Ci and Ci have 

^2) ~(2) 

the same conditional distribution given C'l' = CI ' = H. It then fohows 
from (5. 11), (5. 12) that for some constant c" > and any H £ Ti, 

P(Ci G A I Cf ^ =H)> c"(P(Ci G ^))i/2 . 

Since 

p(Ci G ^) > c"(p(Ci G ^))^/¥(cf ^ G n) , 

the assumption that P(Ci e A) ^ now gives that ¥{c[^^ G H) ^ 0. 

At this point, we combine ah the contiguity results thus far to claim that, 
for any family of simple graphs 

P(cf ^ G J^) = o(l) implies that P(cf ^ G ^) = o(l) . 

Indeed, by definition, the 2-core of Ci is precisely the Poisson-geometric 
model, conditioned on the sum of the degrees (X^^j -Dtilz)„>3) being even. 
Therefore, as J- consists only of simple graphs, clearly we may consider 
this model condition on the graph produced being simple, and in partic- 
ular, that X^u-Du1d„>3 is even. Applying Theorem 4.2 (contiguity with 
Poisson-configuration) , Theorem 3.2 (contiguity with Poisson-cloning) and 
Theorem 2.1 (contiguity with Erdos-Renyi graphs), in that order, now gives 
the above statement. 

(2) 

This fact and the arguments above now give that P(C} G Tt) 0. By 
the definition of we now conclude that 

P(Ci G ^) < P(S^) + P(Q'=) + P(cf ^ en) + (P(Ci G ^))^/2 _ 

Taking a limit, we get that limsup^^o^ ^^(^1 £ A) < 6 and the proof is 
completed by letting 6^0. ■ 

6. The early giant component 

In this section, we consider the special case of Theorem 2 for e = o(n~^/^), 
and namely prove Theorem 1. We show how each of the three steps described 
in Theorem 2 reduces to the corresponding steps in Theorem 1. 
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6.1. Step 1: The kernel. Let A ~ ~ (1 + e - /z, ^). By (5.1), we have 
/i = 1 — e + O(e^), and so 

EA=l + e- ^ = 2e + 0{e^) , Var(A) = ^= , 

giving that A = (2 + o(l))e w.h.p. In particular, the probability that Du > 4 
for some vertex u is 

P(Po(A) > 4) = O(e^) = o(n-i) , 

and a union bound thus implies that the kernel is 3-regular w.h.p. In other 
words, we have A^^^ = for all A; > 4, and 

N = N3 ~ Bin(?i, e^^A'^/6) conditioned to be even. 

It remains to compare the distributions of N and 2 \_Z\ , where 

The first step in this direction is to approximate the binomial variable by 
a Poisson variable. A well-known and straightforward application of the 
Stein-Chen method (see, e.g., [5]) is that for any n and q, 

II Bin(n, q) — Po(nq') ||tv < 9 A nq'^ , 

where the total- variation distance || • ||tv between two distributions o", vr on 
a finite space Q is given by 

r{A)-7TiA)\ = l^\aix)-n{x)\ . (6.1) 

Therefore, given that A = (2 -|- o(l))e (again, this holds w.h.p.) we have 

II Bin(n,e-^AV6) - Po(ne-^AV6)||TV < 0{ne^) = o{n-^/^) . 

Clearly, by definition (6.1), a negligible total-variation distance between two 
distributions already implies they are contiguous (in both directions), hence 
it suffices to compare 2[ZJ to the variable Y, distributed as Po(ne~^A^/6) 
conditioned to be even. We will show that for some region Q such that 
IP(^ G Q) ^ 1 and some c = c(Q) > 0, 

P(y = t) < c • P(2 [ZJ = t) for ah even teQ. (6.2) 

Let 6 > 0. By the above properties of A, there exists some M > such that 

P(|A-2e| > M/y/eK) <6. 

The following result is a special case of a theorem of [14]. 



|(T — TtIItV = sup |(T(--^ 

Acq 2 
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Theorem 6.1 ([14, Ch. 2, Theorem 5.2], reformulated). Let X be a random 
variable on N with ¥{X = k) > for all A; G N. Suppose that EX = v < oo 
and Var X = a"^ < oo. Let Xi be i.i.d. distributed as X and Sm = SI^i ^i- 
Then as m ^ oo, we have 



sup 



mr I ^ — = X ] ^^e ' 







where Cm. = {{z — rav)l^frn : z G Z}. 

In our setting, given that | A— 2e| < Mj y/en we have a Poisson distribution 
with parameter 

m = |e^n + 0{Vehi) , 



3^ 

and clearly the effect of conditioning that it is even, as well as rounding 
m to its integer part, only affect the density function by a constant factor. 
This translates into a sum of [mJ i.i.d. Po(l) variables, and by the above 
theorem we conclude that, as long as |A — 2e| < M/^/en, there exists some 
c = c(M) > such that 

P(y = t) < —j= for any integer t . 

Furthermore, we can choose Q = [m — M'V e^n, m + M'Ve^n] for a suitably 
large M' > so that 

F{Y ^Q) <6. 

By the definition of Z (note that 2[Z\ has mean |e^n + 0(1) and variance 
of order e^n), there exists some c' = c'{M') such that 

c' 

P(2[ZJ =t)> —= for all even t G Q . 



Altogether, it follows that for any sequence of subsets of integers S = 5(n), 
if P(2[ZJ E 5) = o(l) then P(iV G 5) < 25 + o(l). We now let 5 ^ to 
complete the contiguity of N and 2 \Z\ . 

6.2. Step 2: The 2-core. Here we need to compare the effect of replacing 
the 2-paths by i.i.d. Geom(e) variables rather than Geom(l — ^u). Rather 
than just showing contiguity between the two models, we will show a stronger 
statement, namely that the total-variation between the joint distributions of 
the path lengths are negligible. The total- variation distance || • ||tv between 
two distributions a, vr on a finite space O is given by 

||fj — vtIItv = sup \(y{^A) — T^{^A)\ = - — vr(x)| . 
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In our case, since the ratio of ^||_g|fc for 0<p<g<lis monotone 
increasing in k, we can clearly consider sets of the A = {k, k + 1, . . .} in the 
supremum above. Hence, 

II Geom(l - fJ.) - Geom(e)||TV = sup|/i'' - (1 - e)''\ 

k 

< sup I// - (1 - e)| • k[n V (1 - e)f-^ = 0{e) , 

k 

where, as |/i — (1 — e)| = the value of k optimizing the above has 

order 

Recalling that |£'(/C)| = 0{e^n) w.h.p., we infer that the total-variation 
distance between the joint distribution of the 2-path lengths in the two 
models (i.i.d. Geom(l — /i) variables and i.i.d. Geom(e) variables) is O(e^n). 
Our assumption that e = o(n~^/^) now gives that this is o(l). 

6.3. Step 3: The attached trees. We now wish to compare the distribu- 
tions of i.i.d. PGW(//)-trees to i.i.d. PGW(1 — e)-trees. Recall that the size 
of a PGW(7)-tree follows a Borel(7) distribution, as given in (5.3). Thus, 

llBorel(^) - Borel(l - £)||tv = ^ E ^tI^*"'^"^* " " 

t 



<0{e')J2—\xl-'e-^^\t-l-txt)\ 



t 



Noticing that xt = 1 — e + O(e^) and applying Stirling's inequality, we get 

^ ^-'-^e-'''\t-l-txt)\=e{l)t-^'^{xte^-^')'\-l + te + 0{te^)\ 



t\ 



< 0(t-3/2)e-(i-*)V2(i + te) < 0(t-3/2)e--'*/3(i + te) , 



where the first inequality is by the fact that 1 — y < e '^^^"^ for all y > 0, 
and the second one holds for any large n by the definition of xt- Therefore, 

||Borel(/i) - Borel(l - e)||TV = 0{e^) + O (^e^ ^ ^ j 

/"OO -I 

< 0{e') + 0{e') / -^=^e-^'^/'d{e'x/3) = 0{e') , 
Jo -s/e^x/S 

where we used the fact that converges (to \/vr)- 

With high probability, the size of the 2-core (that is, the number of PGW- 
trees we attach) is 0{£'^n), and so the total- variation distance between the 
two joint distributions is at most O(e^n) = o(l), as required. 
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7. Analysis of the Cut-Off Line Algorithm 



In this section, we analyze Algorithm 1 for generating the Poisson cloning 
model, and establish a tight concentration result for Ac (the location of the 
cut-off line when all light clones are exhausted), as stated in Theorem 3.3. 

Proof of Theorem 3.3. We wish to prove inequality (3.4), i.e., that for 
some fixed c > 0, the probability that |Ac — ^a-^I > ^ is at most 
exp(— C7^). 

Notice that, prior to the first time the algorithm reaches Step 2, the notion 
of active/passive vertices does not play a role in its decisions. Since this is 
the only change between subsequent phases, it follows that Ac is precisely 
the same regardless of the choice of phase boundaries. In particular, we may 
choose (3 as follows: Take < /? < ^~2^ arid an integer m such that 



m— 1 




(7.1) 



where 7 as given in the lemma, i.e., 7 = oiy^O'^n). 

In order to prove the lemma, we first estimate the number of j-active 
clones for each j, denoted by Nj. Let Mj be the number of j'-active clones 
that are matched during phase j. We need the following lemma to estimate 
Mj given Nj. 

Lemma 7.1 ([22, Lemma 2.2]). Consider a Poisson ^i- cell for > 0, and 
let N he its total number of clones. For < < 1, let N{6) be the number 
of matched clones once the cut-off line reaches 9fi. Then there exists some 
c> so that the following holds: For any < ^0 < 1 o^i^d I, A > 0, 



max \N(e) - (1 - e'^)k\ > A 
eo<e<i 



N = k 



) <2exp[-c(AA(^)] 



By definition of the Cut-Off Line Algorithm, if either one of the two 
unmatched clones of a passive vertex was matched in a given step, then 
the other clone is guaranteed to be matched in the next step (either in this 
phase or in a later one), as it is inserted to the top of the stack. This means 
that, for the purpose of determining the number of matched active clones 
throughout the phase, Mj, applying the algorithm with or without passive 
vertices is effectively the same (since one can always identify the two clones 
of each passive vertex, then contract the 2-paths into edges between active 
clones) . 

That said, one must consider the following delicate point. If the end- 
of-phase boundary is reached while the top of the stack contains a passive 
clone, this corresponds to a path whose one endpoint is an active clone {u, i), 
yet its other endpoint is a passive clone. In this case, the active clone should 
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not be considered as matched when disregarding all passive clones. Let Aj 
denote this event for phase j, and define 

to be the number of j-active clones that are matched during phase j while 
disregarding (contracting) the passive clones. 

Combining the above observation with the fact that phase j began with 
Nj active clones and a cut-off line at /i = (1 — /3y~^X and ended as soon as 
the cut-off line reached (1 — we apply Lemma 7.1 for 9 = 6q = \ — j3 
and conclude that for some constant c > 0, 

P(|Mj-(l-(l-/3)2)Ar^.| > A|iV,) <2exp[-c(AA^)] . (7.2) 

Let Bj denote the number of vertices which have precisely 2 clones to the 
left of the end-of-phase boundary of phase j, and at least 1 more clone in 
the interval of phase j. Note that, for such a vertex it is clearly j-active, 
and it would become {j + l)-passive if and only if the formerly mentioned 
2 clones are unmatched by the end of phase j. In this case, two formerly 
active clones will be relabeled as passive. In particular, the number of clones 
that transition from being j-active to being {j + l)-passive is at most 2Bj. 

On the other hand, a clone can transition from being j-passive to being 
(j + l)-active if and only if it happened to be at the top of the stack when 
phase j ended, and in particular, the event Aj occurred. 

Adding these two, along with the number of j-active clones matched in 
this phase Mj, we conclude that the number of (j -|- l)-active clones satisfies 

Nj+i > Nj - Mj + - 2Bj = Nj - M'j - 2Bj . (7.3) 

Note that, as long as Step 2 of the algorithm is not reached, the stack 
always consists of light clones exclusively. Therefore, up till that point, if a 
vertex has 2 unmatched clones, both will remain unmatched until the cut-off 
line reaches one of them. Suppose that phase jo is the first one where the 
algorithm invoked Step 2. In that case, for any j < jo, the vertices counted 
in Bj are precisely those that were j-active yet became (j -|- l)-passive. We 
deduce that (7.3) is in fact an equality for all j < jg. 

In order to analyze Bj, for each v £ V and < 6 < 6' < 1 let dy{0,6') 
denote the number of f-clones whose assigned value belongs to [0X,6'X). 
Further let (i„(0) = (i„(0, 0). Recall that phase j begins with the line at 
(1 - py-'^X and ends with the line at (1 - py X. Hence, for % = (1 - /3)^"\ 
we have by the definition of Bj that 
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Observe that {di,{9j+i),dy{9j^i,6j)) ior v £ V are i.i.d. pairs of independent 
Poisson random variables with means and {6j — 6j^i)X respectively. 

Applying Chernoff's bound (cf., e.g., [4]) we have that for some ci > 0, 

- (^i+^i^)' e-e.+iA(i _ ^-l^e.x^ > < 2exp ( - ci|i) . (7.4) 

Combined with (7.2) and (7.3), we arrive at the following estimated lower 
bound for A'^+i: 



n 



Applying this inductively, we expect that the following would be a lower 
bound for Nj: 

9]\{1 - Ae-^^^)n (7.5) 

(indeed, this is later shown in Lemma 7.3). 

Suppose that the algorithm is at the beginning of phase j, and Step 2 
has not been reached yet (in any of the phases thus far). By the discussion 
above, any j-active vertex has either 1 or strictly more than 2 clones with 
values in (0, OjX). In particular, the number of j-active clones that are heavy 
at the beginning of phase j can then be written as 

= ^dy{ej)l{d,(e,)>2} , 

and the number of light clones at the start of phase j is then precisely 

Lj = N, - H, . 

In general (once Step 2 is invoked), Hj is an upper bound for the number of 
j-active clones that are heavy at this point. (Note that the only reason for 
this bound not to be tight is on account of clones that are already matched. 
That is, YlveV '^{dv{ej)>2} counts all j-active heavy vertices, in addition to 
perhaps some whose clones are all matched by phase j.) Hence, Lj is always 
a lower bound for the number of light clones at the start of phase j. 
We need the following large deviation inequality: 

Lemma 7.2 ([23, Corollary 4.2]). Let Xi, . . . , be a sequence of inde- 
pendent random variables. Suppose E[Xi] = /Xj and there are hi, di and 
such that E[(Xj — ^j)^] < hi, and 



E 



{Xi - li.fS^^-^'^A < di for all < lel < ^0 



If ^Co Si^i di < X^i^i bi for some < 6 < 1, then 
for all A > 0. 
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Since dy{6j) are i.i.d. Po(^jA) variables, an application of the above lemma 
gives that for some constant C2 > 0, 



\Hj - ejX{l - e-^^^ - djXe-^^^)n\ > A) < 2exp [-C2(Aa|-)] . (7.6) 
Recalling that m is an integer with (1 — j3)^~^ 



A + ^ (see (7.1)), set 



1 100 V J z^^ 



(7.7) 



i=l 



Observe that the following sequence is increasing in j: 

(1 - l3)-ie^^'^ = (1 - /3)3/2(i - /3)-5i/4 . 
We then get that for all j G [m], 



^(l_/3){i-™)/4(^3^)-l/2 < 



7 



o{l) 



where the last equality used the facts 7 = oUi 6j^n) and 



(1-/3) 



m— 1 



h + 



(l + o(l))^> 



(7i 



(7.9) 



It then follows from (7.8) that for any j G [m]. 



J 100 V J '^^ 



(2j-i-m)/4 



100 



i=l 



/^(l_/3)0-™)/4^(l_^) 



0(^3^) 



(7.10) 



i=l 



Recalling (7.5) and (7.6), we will next establish lower bounds for the Nj^s 
and -Lj's in terms of the following parameters: 

_g forjeH. (7.11) 



Uj = 02^(1 -Ae-"^-)n 

A 



Lemma 7.3. There exists a constant c > such that the following holds: 
F{3j £ [m] : Nj < nj - Aj) < e'^"'^ , 
P (3j G [m] : Lj < Ij - 2Aj) < e'"^^ . 

Proof. With (7.4) in mind, define the following for each j G [m]: 



7^.^(l-/3)0-— )/4^. 



(7.12) 
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It is clear (see definition (7.11)) that 

= (1 - P)% - 2bj , (7.13) 
and furthermore, by (7.8), we have that 

7j = o(y^) . (7.14) 
For H. E [m], decomposing the events in the required lower bound on Nj gives 
P (3j G [£] : Nj < n,j - A^) = P(3i G - 1] : < nj - Aj) 
+ P( nj-e[^-i] {Nj > nj -Aj}n {Ne < - A^) , 

as well as 

P(n^-6[^_i]{Afj > nj - Aj} n {Ne <ni- Ae}) 

< P {Ne-i > ne^i - A^_i , < - A^) = Pe . 

Recall that 7j is decreasing in j, thereby for any constant c'^ > there exists 
a constant c'2 > such that 

^e-<^' <e-'^2^' . 

/ 2 

It will thus suffice to show that Pj < e~'^ "'^ for some constant c' > and 
every j £ [m] . 

For j = 1, recall that Ni is the number of active clones at the beginning of 
the algorithm (since all clones are initially unmatched, the passive vertices 
are those with precisely 2 clones, and all other vertices are active) and 
ni = A(l — Ae~''*)n = IKNi. In addition, Ai = xgoTiV^ and 71 — > 00 
with n (by the fact that (1 — (3)~^ x 1/e co). Hence, by a standard 
application of the Central Limit Theorem to the i.i.d. random variables 
defined by the number of active clones that each vertex contributes, we 

2 

deduce that Pi < e~'^"'''i for some cq > fixed. 

Next, consider Pj+i for j G {1, . . . ,m — 1}. Combining (7.3) with (7.13) 
we get that for each such j 

iV,+i - nj+i > Nj - (1 - Pfrij - Mj - 2{B, - bj) 

= (1 - (1 - pf)N, - Afj + (1 - pf{Nj - n,) - 2{Bj - bj) . 

Therefore, the event addressed in Pj+i implies that 

A,+i < n,+i - iV,+i < Mj - (1 - (1 - (3f)N, + (1 - (3fA, + 2{B, - b,) , 
and also Nj > rij — Aj. In particular, Pj+i is at most the probability that 

M'j - (1 - (1 - (if)Nj + 2{Bj - bj) > Aj+i - (1 - l3fAj , and 

N, > n, - A, . 
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Adding the fact that, by definitions (7.7) and (7.12), 

A,+i = (l-/3)2A, + ^y^, 

and we deduce that 

+ P (Mj - (1 - (1 - fif)N, > I > n, - A,) . 

2 

Combining (7.4) and (7.14), we can obtain an upper bound on the 

first term, for some constant ci > 0. At the same time, (7.2) provides an 

2 

upper bound of e '^^^f+i on the second term, for some C2 > fixed. 

Altogether, we have shown the desired upper bound on Pj for all j G [m], 
implying the first inequality in the lemma. 

Recall now that inequality (7.6) gives that for some constant 03,04 > 0, 

P {^j e [i] : \Hj - OjXil - e"^^^ - %Ae-^^^)n| > A^) 

e 

< 2 ^ e-'^^^^' < e-^4^' . 
i=i 

Combining this with the fact Lj > Nj — Hj, as well as the above lower bound 
on Nj, yields the second statement of the lemma, as required. ■ 

We can now derive a lower bound on the first time that Step 2 is applied 
(that is, the first time at which there are no light clones). Equivalently, this 
gives an upper bound on Ac (the x-coordinate of the cut-off line at that 
point). 

By the definition, the number of light clones throughout the algorithm 
has the following property: 

• As long as there light clones in the stack, in each step one of them 
will be popped and matched, and as a result, at most one new light 
clone will be created. 

• If there are no light clones in the stack (and the algorithm is not 
concluded) then following Step 2 the stack will necessarily be com- 
prised of a single heavy clone. This clone will then be popped in the 
next iteration of Step 1, while creating at most two new light clones. 

That is to say, once the number of light clones drops to 0, it can never again 
exceed 2. In particular, if all the light clones disappear for the first time 
during phase j for some j = 1, . . . , m — 1, we must have that Lm < 2 (since 
Lj is a lower bound on the number of light clones at the start of phase j). 
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By (7.11) we have that 

Im = OmKGm - 1 + e"^-^)/! . 

By the definition of 0\ and its asymptotic behavior (see (3.1), (3. 2)), as well 
as the fact that 6*^ = (1 + o{l))9\, we have that for all Ox < x < 6m the 
function f{x) = x — 1 + e"''*^ satisfies 

f'{x) = = l-Ae-(i+°«)^^^ = 1- A(1-0a)(1-o(^a)) = (l-o(l))e . 

Since f{9x) = and 0^ = 6\+ — = (2 + o(l))e (see (7.9)), we can apply 
the Mean Value Theorem and get 

Im = Om.XfiOm)n = e^X ■ (1 - o(l))e-^n = (i - o(l))7v^ . (7.15) 
On the other hand, by (7.7) (and recalling requirement (3.3) from P) 

m ^ 

Am = — a/S V(l - /3)("^-*)/^ < — , ,,,, 

i=l 

m^ i- a- amy/' '-l^'- 

where the last two inequalities hold for any large n. As = (1 + o{l))6x 
and 9^n — > 00 with n, we immediately have that Im ^ 00 as well. However, 
Lemma 7.3 gives that 

„2 



P(Lm < 2) < ¥{Lm <lm- 2A™,) < e" 



C7 



for some fixed c > 0. By the above discussion, this translates into an upper 
bound of Ac: 

p(Ac>M + ^) <e-^^' . (7.17) 

The above upper bound on Ac ensures that Step 2 is not applied in the 
first m — 1 phases, except with probability exp(— 07^). Recall that, if Step 2 
has not yet been applied in phases 1, ... ,j — 1 then our lower bound (7.3) 
for Nj is in fact an equality, and similarly, Lj is precisely the number of 
light clones at the beginning of phase j. Therefore, assuming that indeed 
Step 2 was not applied in phases 1, . . . ,m — 1 (we account for the above 
error probability), we may now apply the same proof of Lemma 7.3, this 
time with respect to the events {Nj > nj + Aj} and {Lj > Ij + 2Aj}. This 
gives the following matching upper bounds on the -/Vj's and Lj's: 

Lemma 7.4. There exists a constant c > such that the following holds: 
F{3j € [m] : Nj > nj + Aj) < e''^^" , 
P (3j G [m] : L,- > L + 2A.) < e''^^" . 
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To obtain the required upper bound on Ac, assume that Step 2 was 
not apphed in phases 1 , . . . , m — 1 (this happens except with probabihty 
exp(— C7^)). We now wish to show that all light clones will disappear shortly 
after commencing phase m with probability at least 1 — exp(— 07^). 

Since we did not apply Step 2 yet, the stack exclusively contains light 
clones, and a clone is active (more precisely, m-active) iff it has 3 unmatched 
clones or more. Now, if we ignore the passive clones, the algorithm must 
remove at least 2 light clones from the stack in order to create a new light 
clone. 

Suppose that at the beginning of phase m, the stack contains k light 
clones. By the above discussion, after matching k light clones, the stack will 
be of size at most k/2. Iterating, it follows that after matching at most 2k 
active clones, every light clone will disappear (the stack will be exhausted). 

By Lemma 7.4 and the fact that lm + 2Am < 27-^ for any large n with 

room to spare (as established in (7. 15), (7. 16)), there are Lm < 2jyj9^n light 

2 

clones at the beginning of phase m, except with probability 1 — e"'^'^ . Com- 
bined with the above argument, we conclude that the stack of light clones 

will be exhausted after matching at most A'y^J'ofn active clones, except with 
the above error probability. 

We will next show that at least this many active clones will be matched 
by the time the cut-off line reaches the point 9m^ — 10 m— • Since 9^ > 9\, 
by definition (7.11) we have 

Um = 9lX{l - Xe-^'-^)n > 0^A(1 - Xe~^^^)n = 9l{l - A(l - 9x))n 
= 9l{-e + (2 + o{l))e)n = (1 + o{l))9'in . 



Together with Lemma 7.3, we deduce that for a sufficiently large n, there are 
Nm ^ 5^A^ unmatched active clones at the beginning of phase m, except 

2 

with probability 1 — e"'^'^ . Note that, by the assumption on 7, 

9,nX - W^= = (1 - 10 ^ — = (1 - oil))9^ . 

V^^n V {l + o{l))^9ln^ 

Since the boundary marking the end of phase m is at (1 — P)9m (and P 
is bounded away from 0), the cut-off line moves through the entire inter- 
val between and the above point as part of phase m. Hence, we can 
use the original version of the Cut-Off Line Algorithm in order to analyze 
the number of active clones that are matched along this interval (without 
considering a potential change of phase). 
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Applying Lemma 7.1 with 9 = 1 — 10 X— and k = hO^n, we can now 



1(1 _ e^)k = 1(20 - o{l))-^k > (5 - o(l))7y^ 
active clones. Therefore, 



deduce that, except with probability exp(— 07^) we match at least 

leln 



Combining this bound with (7.17) completes the proof of (3.4). ■ 
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